66 research outputs found

    Classification and Error Estimation for Discrete Data

    Get PDF
    Discrete classification is common in Genomic Signal Processing applications, in particular in classification of discretized gene expression data, and in discrete gene expression prediction and the inference of boolean genomic regulatory networks. Once a discrete classifier is obtained from sample data, its performance must be evaluated through its classification error. In practice, error estimation methods must then be employed to obtain reliable estimates of the classification error based on the available data. Both classifier design and error estimation are complicated, in the case of Genomics, by the prevalence of small-sample data sets in such applications. This paper presents a broad review of the methodology of classification and error estimation for discrete data, in the context of Genomics, focusing on the study of performance in small sample scenarios, as well as asymptotic behavior

    Rank discriminants for predicting phenotypes from RNA expression

    Get PDF
    Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes and predicting clinical outcomes. Still, clinical applications remain scarce. One reason is that the complexity of the decision rules that emerge from standard statistical learning impedes biological understanding, in particular, any mechanistic interpretation. Here we explore decision rules for binary classification utilizing only the ordering of expression among several genes; the basic building blocks are then two-gene expression comparisons. The simplest example, just one comparison, is the TSP classifier, which has appeared in a variety of cancer-related discovery studies. Decision rules based on multiple comparisons can better accommodate class heterogeneity, and thereby increase accuracy, and might provide a link with biological mechanism. We consider a general framework ("rank-in-context") for designing discriminant functions, including a data-driven selection of the number and identity of the genes in the support ("context"). We then specialize to two examples: voting among several pairs and comparing the median expression in two groups of genes. Comprehensive experiments assess accuracy relative to other, more complex, methods, and reinforce earlier observations that simple classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Illusion of Distribution-Free Small-Sample Classification in Genomics

    Get PDF
    Classification has emerged as a major area of investigation in bioinformatics owing to the desire to discriminate phenotypes, in particular, disease conditions, using high-throughput genomic data. While many classification rules have been posed, there is a paucity of error estimation rules and an even greater paucity of theory concerning error estimation accuracy. This is problematic because the worth of a classifier depends mainly on its error rate. It is common place in bio-informatics papers to have a classification rule applied to a small labeled data set and the error of the resulting classifier be estimated on the same data set, most often via cross-validation, without any assumptions being made on the underlying feature-label distribution. Concomitant with a lack of distributional assumptions is the absence of any statement regarding the accuracy of the error estimate. Without such a measure of accuracy, the most common one being the root-mean-square (RMS), the error estimate is essentially meaningless and the worth of the entire paper is questionable. The concomitance of an absence of distributional assumptions and of a measure of error estimation accuracy is assured in small-sample settings because even when distribution-free bounds exist (and that is rare), the sample sizes required under the bounds are so large as to make them useless for small samples. Thus, distributional bounds are necessary and the distributional assumptions need to be stated. Owing to the epistemological dependence of classifiers on the accuracy of their estimated errors, scientifically meaningful distribution-free classification in high-throughput, small-sample biology is an illusion

    From Functional Genomics to Functional Immunomics: New Challenges, Old Problems, Big Rewards

    Get PDF
    The development of DNA microarray technology a decade ago led to the establishment of functional genomics as one of the most active and successful scientific disciplines today. With the ongoing development of immunomic microarray technology—a spatially addressable, large-scale technology for measurement of specific immunological response—the new challenge of functional immunomics is emerging, which bears similarities to but is also significantly different from functional genomics. Immunonic data has been successfully used to identify biological markers involved in autoimmune diseases, allergies, viral infections such as human immunodeficiency virus (HIV), influenza, diabetes, and responses to cancer vaccines. This review intends to provide a coherent vision of this nascent scientific field, and speculate on future research directions. We discuss at some length issues such as epitope prediction, immunomic microarray technology and its applications, and computation and statistical challenges related to functional immunomics. Based on the recent discovery of regulation mechanisms in T cell responses, we envision the use of immunomic microarrays as a tool for advances in systems biology of cellular immune responses, by means of immunomic regulatory network models

    Reliable Classifier to Differentiate Primary and Secondary Acute Dengue Infection Based on IgG ELISA

    Get PDF
    Dengue virus infection causes a wide spectrum of illness, ranging from sub-clinical to severe disease. Severe dengue is associated with sequential viral infections. A strict definition of primary versus secondary dengue infections requires a combination of several tests performed at different stages of the disease, which is not practical.We developed a simple method to classify dengue infections as primary or secondary based on the levels of dengue-specific IgG. A group of 109 dengue infection patients were classified as having primary or secondary dengue infection on the basis of a strict combination of results from assays of antigen-specific IgM and IgG, isolation of virus and detection of the viral genome by PCR tests performed on multiple samples, collected from each patient over a period of 30 days. The dengue-specific IgG levels of all samples from 59 of the patients were analyzed by linear discriminant analysis (LDA), and one- and two-dimensional classifiers were designed. The one-dimensional classifier was estimated by bolstered resubstitution error estimation to have 75.1% sensitivity and 92.5% specificity. The two-dimensional classifier was designed by taking also into consideration the number of days after the onset of symptoms, with an estimated sensitivity and specificity of 91.64% and 92.46%. The performance of the two-dimensional classifier was validated using an independent test set of standard samples from the remaining 50 patients. The classifications of the independent set of samples determined by the two-dimensional classifiers were further validated by comparing with two other dengue classification methods: hemagglutination inhibition (HI) assay and an in-house anti-dengue IgG-capture ELISA method. The decisions made with the two-dimensional classifier were in 100% accordance with the HI assay and 96% with the in-house ELISA.Once acute dengue infection has been determined, a 2-D classifier based on common dengue virus IgG kits can reliably distinguish primary and secondary dengue infections. Software for calculation and validation of the 2-D classifier is made available for download

    T-Cell Memory Responses Elicited by Yellow Fever Vaccine are Targeted to Overlapping Epitopes Containing Multiple HLA-I and -II Binding Motifs

    Get PDF
    The yellow fever vaccines (YF-17D-204 and 17DD) are considered to be among the safest vaccines and the presence of neutralizing antibodies is correlated with protection, although other immune effector mechanisms are known to be involved. T-cell responses are known to play an important role modulating antibody production and the killing of infected cells. However, little is known about the repertoire of T-cell responses elicited by the YF-17DD vaccine in humans. In this report, a library of 653 partially overlapping 15-mer peptides covering the envelope (Env) and nonstructural (NS) proteins 1 to 5 of the vaccine was utilized to perform a comprehensive analysis of the virus-specific CD4+ and CD8+ T-cell responses. The T-cell responses were screened ex-vivo by IFN-γ ELISPOT assays using blood samples from 220 YF-17DD vaccinees collected two months to four years after immunization. Each peptide was tested in 75 to 208 separate individuals of the cohort. The screening identified sixteen immunodominant antigens that elicited activation of circulating memory T-cells in 10% to 33% of the individuals. Biochemical in-vitro binding assays and immunogenetic and immunogenicity studies indicated that each of the sixteen immunogenic 15-mer peptides contained two or more partially overlapping epitopes that could bind with high affinity to molecules of different HLAs. The prevalence of the immunogenicity of a peptide in the cohort was correlated with the diversity of HLA-II alleles that they could bind. These findings suggest that overlapping of HLA binding motifs within a peptide enhances its T-cell immunogenicity and the prevalence of the response in the population. In summary, the results suggests that in addition to factors of the innate immunity, "promiscuous" T-cell antigens might contribute to the high efficacy of the yellow fever vaccines. © 2013 de Melo et al

    Gene Expression Profiling during Early Acute Febrile Stage of Dengue Infection Can Predict the Disease Outcome

    Get PDF
    Background: We report the detailed development of biomarkers to predict the clinical outcome under dengue infection. Transcriptional signatures from purified peripheral blood mononuclear cells were derived from whole-genome gene-expression microarray data, validated by quantitative PCR and tested in independent samples. Methodology/Principal Findings: The study was performed on patients of a well-characterized dengue cohort from Recife, Brazil. The samples analyzed were collected prospectively from acute febrile dengue patients who evolved with different degrees of disease severity: classic dengue fever or dengue hemorrhagic fever (DHF) samples were compared with similar samples from other non-dengue febrile illnesses. The DHF samples were collected 2-3 days before the presentation of the plasma leakage symptoms. Differentially-expressed genes were selected by univariate statistical tests as well as multivariate classification techniques. The results showed that at early stages of dengue infection, the genes involved in effector mechanisms of innate immune response presented a weaker activation on patients who later developed hemorrhagic fever, whereas the genes involved in apoptosis were expressed in higher levels. Conclusions/Significance: Some of the gene expression signatures displayed estimated accuracy rates of more than 95%, indicating that expression profiling with these signatures may provide a useful means of DHF prognosis at early stages of infection. © 2009 Nascimento et al

    Description of a Prospective 17DD Yellow Fever Vaccine Cohort in Recife, Brazil

    Get PDF
    From September 2005 to March 2007, 238 individuals being vaccinated for the first time with the yellow fever (YF) -17DD vaccine were enrolled in a cohort established in Recife, Brazil. A prospective study indicated that, after immunization, anti-YF immunoglobulin M (IgM) and anti-YF IgG were present in 70.6% (IgM) and 98.3% (IgG) of the vaccinated subjects. All vaccinees developed protective immunity, which was detected by the plaque reduction neutralization test (PRNT) with a geometric mean titer of 892. Of the 238 individuals, 86.6% had IgG antibodies to dengue virus; however, the presence of anti-dengue IgG did not interfere significantly with the development of anti-YF neutralizing antibodies. In a separate retrospective study of individuals immunized with the 17DD vaccine, the PRNT values at 5 and 10 years post-vaccination remained positive but showed a significant decrease in neutralization titer (25% with PRNT titers < 100 after 5 years and 35% after 10 years)
    corecore